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Abstract 

In the a posteriori approach of multiobjective optimization the Pareto front is approximated 
by a finite set of solutions in the objective space. The quality of the approximation can be mea¬ 
sured by different indicators that take into account the approximation’s closeness to the Pareto 
front and its distribution along the Pareto front. In particular, the averaged Hausdorff indicator 
prefers an almost uniform distribution. An observed drawback of multiobjective estimation of 
distribution algorithms (MEDAs) is that - as common for randomized metaheuristics - the final 
population usually is not uniformly distributed along the Pareto front. Therefore, we propose a 
postprocessing strategy which consists of applying the averaged Hausdorff indicator to the com¬ 
plete archive of generated solutions after optimization in order to select a uniformly distributed 
subset of nondominated solutions from the archive. In this paper, we put forward a strategy for 
extracting the above described subset. The effectiveness of the proposal is contrasted in a series 
of experiments that involve different MEDAs and filtering techniques. 
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1 Introduction 

Many real-world optimization problems involve more than one goal to be optimized and are known 
as multiobjective optimization problems (MOPs), i.e. a set of objective functions fi{x ),..., fM{x) 
should be jointly optimized; formally, 

min F{x) = { fi{x),..., fM{x) ); x e S ■, (1) 

where S C M" is known as the feasible set and could be expressed as a set of restrictions over the 
decision or search space K". The image set O C of S produced by the vector-valued function 
F{-) is called feasible objective set or criterion set. The solution to this type of problem is a set of 
trade-off points. The optimality of a solution can be expressed in terms of the Pareto dominance 
relation. 

Definition 1 (Pareto dominance) In the optimization problem 0 and having x,y G S, x is 
said to dominate y (expressed as x -< y) if^j = 1,.. .M: fj(x) < fj{y) and 3i G {1,... ,M}; 
Mx) < fi{y). The non-dominated subset A* of set AQ S is defined as 

A* = {x G A \ ^x' G A : x' ^ x} . 

The solution of Q is 5*, the non-dominated subset of S. S* is known as the efficient set or 
Pareto-optimal set Its image in objective space is known as the Pareto-optimal front, O*. As 
finding the explicit formulation of S* is often impossible, generally, an algorithm solving Q yields 
a discrete non-dominated set, V*, that approximates S*. The image of V* in objective set, VF*, 
is known as the non-dominated front. 

A broad range of heuristic and metaheuristic approaches has been used to address MOPs [T]. 
Of these, evolutionary multiobjective optimization algorithms (EMOAs) [2] have been found to 
be a competent approach in a wide variety of application domains. Alternatively, multiobjective 
estimation of distribution algorithms (MEDAs) were introduced which aim at learning the problem 
structure and characteristics along the run. In this paper, we will experimentally investigate how 
representative MEDAs perform compared to classical EMOAs and how especially MEDAs can be 
improved after the run by means of a postprocessing approach in order to get more equally spaced 
solutions on the final Pareto front approximation. 

The crucial task is how to measure the performance in the multiobjective setting, i.e. how to 
asses the relation of VF* to O*. Several performance indicators have been proposed including the 
hypervolume indicator or the R indicators, see mu for an overview. Each indicator concentrates 
on special desired characteristics of the front approximation while one frequently discussed aim 
is that elements of VF* should be evenly spread along the true Pareto front in order to present 
an unbiased solution set to the decision maker (e.g. m)- On the application side, specifically 
optimization systems or online control could profit from such approximations such that changes 
from one solution to the other are quite moderate and of equal extent. In [5] an approach for 
evolving Pareto front approximations with uniform gap is presented. However, most indicators do 
not address this issue. 

The Ap indicator introduced in [7] specifically favors approximations with the desired characteristic 
based on averaged Hausdorff distances between VF* and O*. Several algorithmic approaches have 
been introduced so far (e.g. liiaiini to generate quite equally spaced Pareto front approximations 
but all of them are focusing on integrating the Ap based subset selection into the EMOA. In this 
paper, we present a strategy to be applied posterior to an approximation run to extract a most 
uniformly distributed subset from the archive of all nondominated solutions generated within the 
course of the used EMOA. 

The remainder of this paper is organized as follows: In Section]^ the methodological background 
is provided regarding multiobjective estimation of distribution algorithms and the postprocessing 
strategy. Experimental results are presented and discussed in Section Conclusions are drawn 
in Section 1^ supplemented by an outlook on further research directions. 
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2 Methodology 

2.1 Multiobjective Estimation of Distribution Algorithms 

The inclusion of learning as part of the search process can improve the performance of evolutionary 
multiobjective optimization algorithms m- It can be argued that learning would allow to grasp 
the characteristics of the problem being solved and, hence, explore the search space in a more 
efficient manner. There are some approaches that perform this task by providing hybrid evolu¬ 
tionary/machine learning methods, like, for example, the learnable evolution model (LEM) [12] . 
Other approaches attempt to infer the fitness landscape of the problem in order to find promising 
search directions Another alternative for carrying out this task is to resort to what 

has been denominated as estimation of distribution algorithms (EDAs) [TB]. This is because of 
EDAs capacity of learning the problem structure. EDAs replace the application of evolutionary 
operators in the offspring generation process with the creation of a statistical model of the fittest 
elements of the population in a process known as model-building. This model is then sampled 
to produce new elements. Nevertheless, the so called multiobjective EDAs (MEDAs) [T7] have 
not lived up to their a priori expectations. This can be attributed to the fact that most MEDAs 
have limited themselves to transforming single-objective EDAs into a multiobjective formulation 
by including an existing multiobjective fitness assignment function. 

Bosman and Thierens [18] proposed the successful multiobjective mixture-based iterated den¬ 
sity estimation algorithms (MIDEAs). They also proposed a novel Pareto-based and diversity¬ 
preserving fitness assignment function. MIDEA considered several types of probabilistic models 
for both discrete and continuous problems. A mixture of univariate distributions and a mixture of 
tree distributions were used for discrete variables. A mixture of univariate Gaussian models and 
a mixture of multivariate Gaussian factorizations were applied for continuous variables. Different 
variants of MIDEAs implement this in a particular form. For example, naive MIDEA the naive 
Bayes algorithm. 

It can also be argued that the covariance matrix adaptation evolution strategies (CMA-ES) [H] 
are also EDAs, as they construct an abstract model of the population and then sample it to 
produce new individuals. GMA-ES provides a method for updating the covariance matrix of the 
multivariate normal mutation distribution used in an evolution strategy [20] . The covariance 
matrix describes the pairwise dependencies between the variables in the distribution. Adaptation 
of the covariance matrix is equivalent to learning a second-order model of the underlying objective 
function. The multi-objective CMA-ES (MO-CMA-ES) [51] is the extrapolation to the multi¬ 
objective domain. 

It has been pointed out that the current model-building algorithms of MEDAs have a set of 
drawbacks that would prevent those algorithms from yielding substantially better results. In 
particular, the tendency of MEDAs loosing population diversity has repeatedly been reported 
[22l [28] . This situation, although already described in single-objective EDAs [24] , is particularly 
dramatic in the multiobjective case, as diversity and homogeneity are among the desired features 
of the final non-dominated set. An analysis of the results yielded by current MEDAs and their 
scalability with regard to the number of objectives leads to hypotheses regarding the issues that 
might be hampering the obtention of substantially better results with regard to other evolutionary 
approaches [5H] . 

The incorrect handling of data outliers is a paradigmatic example of insufficient comprehension 
of the nature of the model building problem. In ‘standard’ machine learning practice, outliers are 
treated as noisy, inconsistent or irrelevant data. Therefore, this type of data is expected to have 
little influence on the model or just to be disregarded. However, that behavior is not adequate 
for model building. In this case, it is known beforehand that all elements in the data set should 
be taken into account as they represent newly discovered or candidate regions of the search space 
and therefore must be explored. Therefore, these instances should be at least equally represented 
by the model and perhaps even reinforced. This situation is further aggravated by the mating 
selection scheme employed in most MEDAs. The continuous selection of the best part of the 
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population might lead to a premature homogenization of the population and, therefore, to the 
stagnation of the search process. In the second case, the loss of diversity can be traced back to the 
above-described outliers issue of model-building algorithms and also to the incorrect estimation 
or sampling of the model. This fact leads us back to the statement referring that model building 
has not been correctly acknowledged as a different problem with particular requirements. 

There have been some works that propose to improve this situation by introducing model building 
algorithms better suited for the task. That is the case of the MIDEA algorithm family, with the 
introduction of the adaptive variance scaling (AVS) and the standard deviation ratio (SDR) pS] . 
The AVS and SDR combination helps fight the early reduction of the mixture densities variances 
and therefore the premature convergence and diversity loss. Another important milestone has 
been the introduction of the anticipated mean shift (AMS) that takes into account the previous 
values of the means of the distribution to “push” solutions towards the Pareto-optimal front. AMS 
has been conjointly used with AVS in the multiobjective adapted maximum-likelihood Gaussian 
mixture model (MAMaLGaM-X) [57j. Similarly, the multiobjective optimization neural EDA 
(MONEDA) [2H] embeds a custom-made model building algorithm [521 to maintain 

diversity by correctly handling the outlier elements. This approach has been improved by the in¬ 
troduction of the match-based learning paradigm of adaptive resonance theory (ART) [30] leading 
to the multiobjective ART EDA (MARTEDA) [3Tj . 

Other approaches propose modifications to currently existing methods. For example, in [53] a 
method that avoids too sudden drops (to zero) of the variances of a multivariate Gaussian model 
is proposed. Similarly, in |32] a permutation sampling is introduced that eliminates the sampling 
errors of UMDA [33]. Other approaches [SIES] have tried strategies to re-inject “fresh” individuals 
that are kept on a evolutionary algorithm that is run in parallel. In spite of these efforts, the issue 
of population diversity and lack of homogeneity is still an open topic that must be addressed. 

2.2 Postprocessing of MEDA results 

Suppose we are particularly interested in a finite size approximation of O* = F(S*) that exhibits 
a sufficiently good spread and a small distance to O*. Previous work in the context of EMOA (see 
e.g. 0) has shown that the averaged Hausdorff distance [7] can be used to achieve this goal. 

Definition 2 Let A,Bg be non-empty finite sets. The value 

Ap{A,B) = max(GDp(A, 5), IGDp(A, R)) with 



for p > 0 is termed the averaged Hausdorff distance between sets A and B where d(u, A) := 
inf{||M — u|| : V G A} for u,v G and some vector norm || • ||. 

Suppose that some MEDA has generated an approximation of the Pareto front for some MOP. 
Typically, this approximation does not yield a finite point set in objective space that is evenly 
distributed. Therefore, we propose the following postprocessing approach: 

1. Run your favorite MEDA that is equipped with a tiny add-on: store a copy of each offspring 
that is generated and evaluated in a hie. 

2. After termination of your favorite MEDA: 

construct an evenly spaced reference front from a given approximation of the Pareto front 
(e.g. from last population of favorite MEDA); then feed each stored offspring into the Ap 
archive updater (see alg. sequentially; the hnal content of the archive A is the desired 
approximation. 
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For bi-objective problems the reference front R can be constructed as follows: calculate a linear 
interpolation from a given approximation of the Pareto front. Since the length L of the resulting 
polygonal line can be derived easily, division by the number of desired archive points m yields 
the size of the equal spacing 5 = L/m which is used to place m points along the polygonal line 
with distance S, where the extreme points of the discretization are moved half the length inwards 
the polygonal line. After the reference front R has been constructed it is used in the Ap archive 
updater to decide which point should be added to or deleted from the archive. 

An update operation can be realized as sketched in alg{^ This naive approach takes 0(|A| • (|A| • 
|i?| • M)) time units, but there exists a quick update version [S] that only needs 0(|A| • \R\ ■ M) 
time units. 


Algorithm 1 Naive Ai-update [9] 

Require: archive set A, reference set i?, new element x 

1: A= ND/(AU{£c},^) 

2: if |A| > Afl := |i?| then 
3: for all a G A do 

4: /i(a) = Ai (A \ {a}, ii) 

5: end for 

6: A* = {a* G A: a* = argmin{/i(a) : a G A}} 

7: if |A*| > 1 then 

8: a* = argmin{GDp(A \ {a}, R) : a G A*} 

9: end if 

10: A = A\{a*} 

11; end if 


The most obvious order of feeding the stored pairs {x, F{x)) into the archive updater is the order 
of their generation. We call this the ‘forward update.’ In this manner, many individuals will pass 
the initial dominance check, so that subsequent Ap calculations are necessary. Some time saving 
may be achieved by feeding the stored pairs into the archive update in inverted order. We call this 
the ‘backward update.’ Since points that have been generated in later iterations of the MEDA 
are more likely to dominate previous points, most points from the rear of the inverted sequence 
will probably not pass the initial dominance check, so that subsequent Ap calculations can be 
avoided. Since the order of the points presented to the archive clearly affects the hnal outcome of 
the archive, we shall compare both approaches experimentally in the next section. 


3 Experiments 

3.1 Experimental Setup 

In order to experimentally evaluate the Ap archive approximation for multiple multiobjective 
EDAs, we implemented two offline strategies in the JAVA"'’^ jMetal framework [31]: a forward 
Ap-approach {+fDP), starting with the initial population and iteratively updating with previ¬ 
ously traced generations, whereas the backward-strategy {+bDP) starts with the final population. 
Within the Ap-update mechanisms, a linear interpolation and a PSA-based strategy, both using 
the aggregated solutions of the whole run, were used for deriving a set of evenly spaced reference 
points. Based on that set, a solution was eventually added to the archive. 

We then used well known MOPs for benchmarking [5]: an instance of the sphere problem with 
A C and a connected convex Pareto front, the DENT problem with A C and a connected 
convex-concave-convex front, ZDT3 with A C and a disconnected convex/concave front, as 
well as WEGI with ACK^+^, also with a convex-concave front. 

Then, reference fronts covered by 1,000 uniformly spaced points were created in order to compare 
our Ap approximation quality for all algorithms. With the exception of WFGl, we were able to 
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MARTEDA 

F2 vigilance threshold (p) 

0.05 

Initial standard deviations {(p) 

0.01 

Selection percentile (a) 

0.3 

Pf to N* ratio ( 7 ) 

0.5 

Substitution percentile (cd) 

0.25 

MONEDA 

Number of initial GNG nodes (Nq) 

2 

Maximum edge age (i^max) 

40 

Best node learning rate (cb) 

0.1 

Neighboring nodes learning rate (e^) 

0.05 

Insertion error decrement rate {(5i) 

0.1 

General error decrement rate ((5 g) 

0.1 

Accumulated error threshold (p) 

0.2 

Selection percentile (a) 

0.3 

Vt to Af„ax ratio ( 7 ) 

0.5 

Substitution percentile (cu) 

0.25 

Naive MIDEA 

Selection percentile (r) 

0.3 

Diversity percentile (<5) 

15 

Number of parents of a variable (k) 

2 

Maximum number of clusters 

rO.SLxnpopJ] 

Threshold for the leader algorithm 

0.1 

MO-CMA-ES 

Initial step size (cr) 

1 

Damping for step-size (d) 

1 + ripop /2 

Target success rate 

0.181 

Cumulation time horizon 

2 /(npop + 2 ) 

Covariance learning rate 

2 /(nppp -h 6 ) 

Threshold success rate 

0.44 


Table 1: Parameters of the algorithms used in the experiments following the notation of the corresponding 
author(s). 

find a parametric form for exactly calculating the optimal fronts’ length L (by rectification), for 
all benchmark problems. Given that form 1,000 points were placed uniformly on the PF using 
distance AL = L/1,000 between points and starting with the extreme points moved inwards by 
AL/2. In case of WFGl [37], the neighborhood of the known Pareto set was explored using 
a very fine grid (per dimension of the decision space), in order to find a PF. Then 1,000 well- 
distributed solutions among the non-dominated solutions were kept. As the number of reference 
solutions is by far larger than the approximated set’s size, a not perfectly uniform distribution of 
the Pareto-optimal solutions is acceptable. 

Our evaluation is based on four state-of-the-art general purpose EMOAs and four MEDAs. From 
the former group, we employed the NSGA2 [^ and SMS-EMOA [35] with standard parameter 
settings - SBX crossover with = 0.9 and polynomial mutation with Pmut = 1/n - as well 
as PSEMOA as special purpose EMOA with inherent internal clustering for generating evenly 
distributed solutions. For the same reason we also used a modified version of NSGA2 [33] - the 
sequential crowding distance NSGA2 (SCD-NSGA2) - which successively alternates between re¬ 
moving the individuals with the smallest crowding distance and recomputing the crowding distance 
values, until only p solutions are left. 

Considering the MEDAs we deal with two well-known approaches: the naive MIDEA [18] and the 
multiobjective CMA-ES (MO-CMA-ES) ]3T]. We also include MONEDA [35] and MARTEDA 
m as they are supposed to have a better handling of diversity. The parameters of the MEDAs 
are summarized in Table [T] 

During our experiments, all test problems were optimized 20 times by all algorithms, given a 
budget of 50,000 function evaluations each and population sizes p G {10, 20,100}. The data, 
which were generated from the optimizers’ runs, have been submitted to the offline Ap archivers. 
Both update strategies were parameterized with p G {1,2}. 
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Figure 1: Comparison of MEDAs with EMOAs regarding Hypervolume. 


3.2 Results 

At first, overall algorithm behavior is investigated by comparing the MEDAs with the classical 
EMOA approaches in terms of the Dominated Hypervolume indicator (HV, |1]) as well as Ap 
(Figures and based on the generated reference fronts as described in Section [O] It becomes 
obvious that the MEDAs lead to very stable results over the repeated runs which is reflected by 
both indicators while no MEDA is superior to the others. Moreover, for all test problems the 
respective performance is comparable to the best performing algorithm out of the classical EMOA 
set. While over all test problems but WFGl PSEMOA and SCD-NSGA2 can be considered 
as best regarding Ap, the SMS-EMOA unsurprisingly outperforms the other classical EMOA 
in terms of HV as it internally optimizes the HV indicator. The test problem WFGl results in 
different algorithm rankings within the classical EMOA, i.e. NSGA2 clearly is best regarding both 
indicators. The stability of MEDA results is very noticeable here as the variability of PSEMOA, 
SMS-EMOA as well as SCD-NSGA2 is large. Goncluding, the MEDAs are at the least competitive 
with the classical EMOAs on the considered test problems. 

In Figures|^-[^results of the postprocessing strategies applied to the MEDA variants are visualized 
in terms of Ap regarding the generated reference fronts for population sizes p, S {10, 20,100}. In 
general, experimental results become more distinguishable with increasing population size but the 
basic findings are reflected for the smallest population size as well. However, varying p does not 
have a noticeable effect in this setting, thus the analysis concentrates on p = 1 for illustration 
purposes. 

Clearly, the postprocessing improves the final approximation quality of the MEDA’s run as the 
primary algorithm in terms of Ap for all population sizes. As the Pareto front of ZDT3 is discon¬ 
nected, the performance of the considered postprocessing differs from the remaining test problems. 
Here, the two PSA-based strategies {+{fPSA) and +{bPSA)) outperform the archive strategy 
using linear interpolations of the set of all nondominated points generated in the course of the 
algorithm run {+{fDP) and +{bDP)). This is intuitive as the latter approach ignores the discon¬ 
tinuity in the interpolation and tends to place reference points also in unattainable regions m- 
Thus, this effect increases with the population size as more points have to placed on the linear 
interpolation of the front. For /i = 10 the strategies +{fDP) and +{bDP) are comparable or 
even slightly better than the PSA-based ones. For all other test problems +{fDP) and +{bDP) 
generate the largest improvement in terms of Ap. Differences are statistically significant in most 
settings reflected by a Wilcoxon rank sum test {p < 0.05) and annotated with a * in Figures [^-[^ 
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'' >gr 










sSf' 


Interestingly, for /i = 100 and WFGl differences between the primary MEDA and all postprocess¬ 
ing variants are very small and not statistically significant. This is due to the special challenges the 
test problem poses onto the algorithms. Thorough analysis of the final approximation sets of the 
MEDAs revealed that those runs do not allow for substantial improvements of the final solution 
sets as the algorithm performance is not good enough and the archive of solutions does not provide 
the means for selecting a better subset regarding Ap. However, substantially increasing the bud¬ 
get of the MEDAs will lead to results showing the same tendencies as described above for smaller 
population sizes. As expected, the direction of the archive update strategy has no influence on the 
final approximation quality. However, our experiments revealed that the backward strategy which 
starts from the final population of the primary MEDA requires much less updating iterations until 
the selected subset becomes stable than the forward strategy. Therefore, we recommend to favor 
this approach over the forward update in general such that the postprocessing will be much more 
computationally efficient. 
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Figure 3: Ap- and PSA-postprocessing applied to MEDA results for p = 10 and p = 1. 
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Figure 4: Ap- and PSA-postprocessing applied to MEDA results for p = 20 and p = 1 
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Figure 5: Ap- and PSA-postprocessing applied to MEDA results for /i = 100 and p = 1. 

4 Conclusions and Outlook 

Within this work, we first compared the quality of the solutions of four evolutionary multiobjective 
optimization algorithms (NSGA2, PSEMOA, SCD-NSGA2, SMS-EMOA) and four multiobjective 
estimation of distribution algorithms (MARTEDA, MO-CMA-ES, MONEDA, Naive MIDEA) on 
a set of well-known multiobjective test problems. Based on two considered indicators - HV and 
Ap - it could be shown that the performance of the MEDAs is at least competitive to classical 
EMOAs. Also, the MEDAs itself performed quite similar to each other. 

Due to those competitive and thus promising results, we analyzed whether the performance of the 
latter ones regarding Ap could even be improved by a subsequent application of either forward or 
backward Ap-update strategies which will lead to a more uniformly distributed solutions along the 
approximated Pareto front. Experiments confirmed this hypothesis and apparently the influence 
of using either one of those update directions is negligible. On the other hand, the Ap-based 
techniques outperform the PSA-based ones besides for problems with disconnected Pareto fronts. 

In future work, one might want to improve the MEDAs performance either based on the PSA- or 
Ap-update strategy by integrating that approach into the algorithm’s mechanics. This hopefully 
leads to algorithms, which generate comparable or even better solutions within a single run and 
therefore without employing any additional postprocessing. 
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